Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-7689] Remove TTL-based metadata cleaning in Spark 2.0 #10534

Closed
wants to merge 13 commits into from

Conversation

JoshRosen
Copy link
Contributor

This PR removes spark.cleaner.ttl and the associated TTL-based metadata cleaning code.

Now that we have the ContextCleaner and a timer to trigger periodic GCs, I don't think that spark.cleaner.ttl is necessary anymore. The TTL-based cleaning isn't enabled by default, isn't included in our end-to-end tests, and has been a source of user confusion when it is misconfigured. If the TTL is set too low, data which is still being used may be evicted / deleted, leading to hard to diagnose bugs.

For all of these reasons, I think that we should remove this functionality in Spark 2.0. Additional benefits of doing this include marginally reduced memory usage, since we no longer need to store timetsamps in hashmaps, and a handful fewer threads.

@JoshRosen JoshRosen changed the title [SPARK-7689][WIP] Remove TTL-based metadata cleaning [SPARK-7689][WIP] Remove TTL-based metadata cleaning in Spark 2.0 Dec 31, 2015
@jerryshao
Copy link
Contributor

Good to remove it :). It has been obsolete for a long time, and will potentially lead to some unexpected behaviors especially in Streaming (like block not found).

@SparkQA
Copy link

SparkQA commented Dec 31, 2015

Test build #48528 has finished for PR 10534 at commit 98b732a.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jan 1, 2016

LGTM. We should let @tdas take a look at this though.

@@ -81,6 +81,7 @@ class StreamingContextSuite extends SparkFunSuite with BeforeAndAfter with Timeo

test("from conf with settings") {
val myConf = SparkContext.updatedConf(new SparkConf(false), master, appName)
// TODO(josh): Update these exmaples to use a different configuration.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO.

@SparkQA
Copy link

SparkQA commented Jan 2, 2016

Test build #48592 has finished for PR 10534 at commit e6482fa.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 2, 2016

Test build #48590 has finished for PR 10534 at commit 5ffe30f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 3, 2016

Test build #48601 has finished for PR 10534 at commit 1d96791.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor Author

Jenkins, retest this please.

@gatorsmile
Copy link
Member

All the recent test builds failed due to the same issue.

@SparkQA
Copy link

SparkQA commented Jan 3, 2016

Test build #48616 has finished for PR 10534 at commit 1d96791.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Jan 4, 2016

Test build #48645 has finished for PR 10534 at commit 1d96791.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

+1

@JoshRosen JoshRosen changed the title [SPARK-7689][WIP] Remove TTL-based metadata cleaning in Spark 2.0 [SPARK-7689] Remove TTL-based metadata cleaning in Spark 2.0 Jan 4, 2016
@@ -75,7 +77,7 @@ private[spark] class BlockManager(

val diskBlockManager = new DiskBlockManager(this, conf)

private val blockInfo = new TimeStampedHashMap[BlockId, BlockInfo]
private val blockInfo = new ConcurrentHashMap[BlockId, BlockInfo]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isnt it simpler to keep the Scala map interface? Will minimize changes in rest of code in this class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was worried about the pitfalls of .getOrElseUpdate not being atomic on ConcurrentHashMaps that had been wrapped into Scala maps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

@SparkQA
Copy link

SparkQA commented Jan 4, 2016

Test build #48693 has finished for PR 10534 at commit 0868fab.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -1439,7 +1428,7 @@ Apart from these, the following properties are also available, and may be useful
<td>
A comma separated list of ciphers. The specified ciphers must be supported by JVM.
The reference list of protocols one can find on
<a href="https://blogs.oracle.com/java-platform-group/entry/diagnosing_tls_ssl_and_https">this</a>
a href="https://blogs.oracle.com/java-platform-group/entry/diagnosing_tls_ssl_and_https">this</a>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this < removed??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, thanks for catching that.

@tdas
Copy link
Contributor

tdas commented Jan 6, 2016

LGTM, pending tests.

@SparkQA
Copy link

SparkQA commented Jan 7, 2016

Test build #48878 has finished for PR 10534 at commit 9704bc2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Jan 7, 2016

Test build #48889 has finished for PR 10534 at commit 2451963.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jan 7, 2016

Merging. Thanks.

@asfgit asfgit closed this in 8e19c76 Jan 7, 2016
@JoshRosen JoshRosen deleted the remove-ttl-based-cleaning branch August 29, 2016 19:21
@JoshRosen
Copy link
Contributor Author

Cross-references to related PRs (to aid other code archaeologists):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants